Techniques for the Phonetic Description of Emotional Speech

نویسنده

  • Peter Roach
چکیده

It is inconceivable that there could be information present in the speech signal that could be detected by the human auditory system but which is not accessible to acoustic analysis and phonetic categorisation. We know that humans can reliably recognise a range of emotions produced by speakers of their own language on the basis of the acoustic signal alone, yet it appears that our ability to identify the relevant acoustic correlates is at present rather limited. This paper proposes that we have to build a bridge between the human perceptual experience and the measurable properties of the acoustic signal by developing an analytic framework based partly on auditory analysis. A possible framework is outlined which is based on the work of the Reading/Leeds Emotional Speech Database. The project was funded by ESRC Grant no. R000235285. 1. THE NEED FOR CODING The detailed study of large amounts of emotional speech presents the researcher with two main requirements: one is a theory of emotions and their categorisation, and the other a method for transcribing those aspects of speech which are believed to be relevant in the signalling and recognition of emotions. This paper is concerned exclusively with the latter, though some of the work reported here has been concerned with a transcription system which entails both (Greasley et al, 1995 [1]). In devising a transcription for the description of emotional speech, certain requirements need to be satisfied. The principal requirements are presented below. (i) The transcription system must allow exhaustive and unambiguous coding of all features of speech which could possibly be relevant in the signalling and recognitions of emotions in speech. (ii) The system should ideally use features which are capable of being defined in measurable acoustic terms. (iii) Transcription, storage and retrieval of emotional speech data using the transcription system should be made as efficient and ergonomically practical as possible. (iv) Inter-transcriber reliability should be given high priority. (v) As far as possible, the transcription system should be compatible with existing systems. In the following sections, we consider how the above requirements may be met. 2. PHONETIC FEATURES The view of the territory in which we are working was for much of the twentieth century obscured by a persistent and highly misleading view of the role of prosody in speech. This view had two linked assumptions: (i) intonation is used by speakers to convey their emotions and attitudes (ii) intonation consists of variations in pitch which may be observed scientifically by measuring fundamental frequency. It was recognised at least as early as the 1950’s that this view was fundamentally incorrect, yet it persists in the contemporary literature. In the transcription of psychiatric interviews it was recognised that a description based solely on pitch variation was incapable of capturing the rich variety of relevant phonetic features (Trager, 1958 [2]; Pittenger et al, 1960 [3]). At the outset of the work on the Survey of English Usage, Crystal and Quirk (1964 [4]) recognised that a full transcription of the data that was likely to capture all relevant phonetic information must use an analytic framework that went far beyond the simple intonation descriptions of the time (e.g. the first edition of O’Connor and Arnold [5]). This point of view was set out in much fuller form in Crystal (1969, [6]). While at this time theories of prosodic and paralinguistic features were being developed in the context of linguistic description, other researchers were developing descriptive frameworks with a more sociological orientation. Laver (1968, [7]) is a good example of this movement; other work reprinted in Laver and Hutcheson (1972, [8]) also makes valuable reading. It is surprising that some of the most influential work of recent times in the field of prosody such as the work on Discourse ISCA Archive

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

مراحل و نحوه ی تهیه ی دادگان های صوتی هجایی و دایفونی برای سامانه ی تبدیل متن به گفتار فارسی

Abstract Speech databases are part of the concatenative text to speech synthesis systems. Phonetic quality of the databases plays a significant role in the naturalness of the synthesized speech. This paper introduces two syllable and diphone speech databases for Persian and investigates the way of their development and their specifications and their advantages to each other. ...

متن کامل

Classification of emotional speech using spectral pattern features

Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...

متن کامل

Phonetics of Emotion in Russian Speech

This paper provides a description of the structure and goals of the database of Russian Language Affective (emotional) utterances. It also reports some preliminary results of an experimental phonetic analysis of the acoustic characteristics of emotional utterances (surprise, happiness, anger, sadness and fear) vs. neutral ones in Russian. The study utilizes 600 database utterances by 10 speaker...

متن کامل

On the Relationship between Emotional Intelligence and Directive Speech Acts Preference

Language and emotion are two related systems in use, in that one system (emotions) impacts the performance of the other (language). Both of them share their functionality in communication. Since the nature of foreign language classrooms is ideally interactional, emotional intelligence (EI) gains importance. The aim of this study was to find out whether one's total emotional quotient and its com...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000